{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas基础"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Pandas](http://pandas.pydata.org/)简介\n",
    "\n",
    "> * python数据分析library\n",
    "> * 基于numpy (对ndarray的操作)\n",
    "> * 有一种用python做Excel/SQL/R的感觉\n",
    "\n",
    "## 目录\n",
    "> * Series\n",
    "> * DataFrame\n",
    "> * Index, reindex and hierarchical indexing\n",
    "> * Merge, Join, Concatenate, Groupby and Aggregate\n",
    "> * Read from csv\n",
    "> * 随堂案例：bikes routes counts\n",
    "\n",
    "## 数据结构Series\n",
    "\n",
    "### 构造和初始化Series\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series是一个一维的数据结构 对list的封装"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                  7\n",
       "1            Beijing\n",
       "2               2.17\n",
       "3              -1232\n",
       "4    Happy birthday!\n",
       "dtype: object"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([7, 'Beijing', 2.17, -1232, 'Happy birthday!'])\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pandas会默认用0到n来作为Series的index，但是我们也可以自己指定index。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([7, 'Beijing', 2.17, -1232, 'Happy birthday!'], index = ['A','B','C','D','E'])\n",
    "type(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还可以用dictionary来构造一个Series，因为Series本来就是key value pairs。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Beijing      55000.0\n",
      "Guangzhou    25000.0\n",
      "Hangzhou     20000.0\n",
      "Shanghai     60000.0\n",
      "Shenzhen     50000.0\n",
      "Suzhou           NaN\n",
      "dtype: float64\n",
      "<class 'pandas.core.series.Series'>\n"
     ]
    }
   ],
   "source": [
    "cities = {'Beijing': 55000, 'Shanghai': 60000, 'Shenzhen': 50000, 'Hangzhou': 20000, \\\n",
    "          'Guangzhou': 25000, 'Suzhou': None}\n",
    "apts = pd.Series(cities)\n",
    "print(apts)\n",
    "print(type(apts))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 选择数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "前面定义的index就是用来选择数据的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20000.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[\"Hangzhou\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Hangzhou    20000.0\n",
       "Beijing     55000.0\n",
       "Shenzhen    50000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[[\"Hangzhou\",\"Beijing\",\"Shenzhen\"]]\n",
    "# type(apts[[\"Hangzhou\", \"Beijing\", \"Shenzhen\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "大家还记得上次numpy讲到的boolean indexing吗？pandas当中也可以使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Guangzhou    25000.0\n",
       "Hangzhou     20000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[apts < 50000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Beijing      False\n",
      "Guangzhou     True\n",
      "Hangzhou      True\n",
      "Shanghai     False\n",
      "Shenzhen     False\n",
      "Suzhou       False\n",
      "dtype: bool\n"
     ]
    }
   ],
   "source": [
    "less_than_50000 = apts < 50000\n",
    "print(less_than_50000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Guangzhou    25000.0\n",
       "Hangzhou     20000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[less_than_50000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我再详细展示一下这个boolean indexing是如何工作的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series元素赋值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Old value:', 50000.0)\n"
     ]
    }
   ],
   "source": [
    "print(\"Old value:\",apts[\"Shenzhen\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "apts[\"Shenzhen\"] = 55000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "55000.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[\"Shenzhen\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Guangzhou    25000.0\n",
      "Hangzhou     20000.0\n",
      "dtype: float64\n",
      "Beijing      55000.0\n",
      "Guangzhou    40000.0\n",
      "Hangzhou     40000.0\n",
      "Shanghai     60000.0\n",
      "Shenzhen     55000.0\n",
      "Suzhou           NaN\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(apts[apts < 50000])\n",
    "apts[apts <= 50000] = 40000\n",
    "print(apts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数学运算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      27500.0\n",
       "Guangzhou    20000.0\n",
       "Hangzhou     20000.0\n",
       "Shanghai     30000.0\n",
       "Shenzhen     27500.0\n",
       "Suzhou           NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts / 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      110000.0\n",
       "Guangzhou     80000.0\n",
       "Hangzhou      80000.0\n",
       "Shanghai     120000.0\n",
       "Shenzhen     110000.0\n",
       "Suzhou            NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts * 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      3.025000e+09\n",
       "Guangzhou    1.600000e+09\n",
       "Hangzhou     1.600000e+09\n",
       "Shanghai     3.600000e+09\n",
       "Shenzhen     3.025000e+09\n",
       "Suzhou                NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.square(apts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      3.025000e+09\n",
       "Guangzhou    1.600000e+09\n",
       "Hangzhou     1.600000e+09\n",
       "Shanghai     3.600000e+09\n",
       "Shenzhen     3.025000e+09\n",
       "Suzhou                NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts ** 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们再定义一个新的Series做加法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      300000\n",
       "Chongqin     150000\n",
       "Guangzhou    200000\n",
       "Shanghai     400000\n",
       "Shenzhen     300000\n",
       "Tianjin      200000\n",
       "dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars = pd.Series({'Beijing':300000, 'Shanghai':400000, 'Shenzhen':300000, \\\n",
    "                     'Tianjin':200000, 'Guangzhou':200000, 'Chongqin':150000})\n",
    "cars"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Beijing      5800000.0\n",
      "Chongqin           NaN\n",
      "Guangzhou    4200000.0\n",
      "Hangzhou           NaN\n",
      "Shanghai     6400000.0\n",
      "Shenzhen     5800000.0\n",
      "Suzhou             NaN\n",
      "Tianjin            NaN\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(cars + apts * 100) #不重叠的相加都是NaN"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据缺失"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'Hangzhou' in apts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'Hangzhou' in cars"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing       True\n",
       "Guangzhou     True\n",
       "Hangzhou      True\n",
       "Shanghai      True\n",
       "Shenzhen      True\n",
       "Suzhou       False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts.notnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      False\n",
       "Guangzhou    False\n",
       "Hangzhou     False\n",
       "Shanghai     False\n",
       "Shenzhen     False\n",
       "Suzhou        True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts.isnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Suzhou   NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[apts.isnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      55000.0\n",
       "Guangzhou    40000.0\n",
       "Hangzhou     40000.0\n",
       "Shanghai     60000.0\n",
       "Shenzhen     55000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[apts.notnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing      55000.0\n",
       "Guangzhou    40000.0\n",
       "Hangzhou     40000.0\n",
       "Shanghai     60000.0\n",
       "Shenzhen     55000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apts[apts.isnull() == False]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据结构[Dataframe](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)\n",
    "\n",
    "一个Dataframe就是一张表格，Series表示的是一维数组，Dataframe则是一个二维数组，可以类比成一张excel的spreadsheet。也可以把Dataframe当做一组Series的集合。\n",
    "\n",
    "### 创建一个DataFrame\n",
    "\n",
    "\n",
    "dataframe可以由一个dictionary构造得到。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Beijing</td>\n",
       "      <td>2100</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        city  population  year\n",
       "0    Beijing        2100  2016\n",
       "1   Shanghai        2300  2017\n",
       "2  Guangzhou        1000  2016\n",
       "3   Shenzhen         700  2017\n",
       "4   Hangzhou         500  2016\n",
       "5   Chongqin         500  2016"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {'city':['Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen', 'Hangzhou', 'Chongqin'],\n",
    "       'year':[2016, 2017, 2016, 2017, 2016, 2016],\n",
    "       'population': [2100, 2300, 1000, 700, 500, 500]}\n",
    "pd.DataFrame(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2016</td>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year       city  population\n",
       "0  2016    Beijing        2100\n",
       "1  2017   Shanghai        2300\n",
       "2  2016  Guangzhou        1000\n",
       "3  2017   Shenzhen         700\n",
       "4  2016   Hangzhou         500\n",
       "5  2016   Chongqin         500"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#columns的名字和顺序可以指定\n",
    "pd.DataFrame(data, columns = ['year', 'city', 'population'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2100</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2016</td>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year       city  population debt\n",
       "0  2016    Beijing        2100  NaN\n",
       "1  2017   Shanghai        2300  NaN\n",
       "2  2016  Guangzhou        1000  NaN\n",
       "3  2017   Shenzhen         700  NaN\n",
       "4  2016   Hangzhou         500  NaN\n",
       "5  2016   Chongqin         500  NaN"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(data, columns = ['year', 'city', 'population', 'debt'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       year       city  population debt\n",
      "one    2016    Beijing        2100  NaN\n",
      "two    2017   Shanghai        2300  NaN\n",
      "three  2016  Guangzhou        1000  NaN\n",
      "four   2017   Shenzhen         700  NaN\n",
      "five   2016   Hangzhou         500  NaN\n",
      "six    2016   Chongqin         500  NaN\n"
     ]
    }
   ],
   "source": [
    "frame2 = pd.DataFrame(data, columns = ['year', 'city', 'population', 'debt'], index = ['one', 'two', 'three', 'four', 'five', 'six'])\n",
    "print(frame2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 从DataFrame里选择数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one        Beijing\n",
       "two       Shanghai\n",
       "three    Guangzhou\n",
       "four      Shenzhen\n",
       "five      Hangzhou\n",
       "six       Chongqin\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2['city']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(frame2['city'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "one        Beijing\n",
      "two       Shanghai\n",
      "three    Guangzhou\n",
      "four      Shenzhen\n",
      "five      Hangzhou\n",
      "six       Chongqin\n",
      "Name: city, dtype: object\n"
     ]
    }
   ],
   "source": [
    "print(frame2.city)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "year               2016\n",
       "city          Guangzhou\n",
       "population         1000\n",
       "debt                NaN\n",
       "Name: three, dtype: object"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.ix['three']\n",
    "\n",
    "# frame2.loc['three']\n",
    "\n",
    "# frame2.iloc['three'] # 不可用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "year               2016\n",
      "city          Guangzhou\n",
      "population         1000\n",
      "debt                NaN\n",
      "Name: three, dtype: object\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    }
   ],
   "source": [
    "print(frame2.ix[2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrame元素赋值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    }
   ],
   "source": [
    "frame2[\"population\"][\"one\"] = 2200"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2200</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>six</th>\n",
       "      <td>2016</td>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year       city  population debt\n",
       "one    2016    Beijing        2200  NaN\n",
       "two    2017   Shanghai        2300  NaN\n",
       "three  2016  Guangzhou        1000  NaN\n",
       "four   2017   Shenzhen         700  NaN\n",
       "five   2016   Hangzhou         500  NaN\n",
       "six    2016   Chongqin         500  NaN"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2200</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>six</th>\n",
       "      <td>2016</td>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year       city  population       debt\n",
       "one    2016    Beijing        2200  100000000\n",
       "two    2017   Shanghai        2300  100000000\n",
       "three  2016  Guangzhou        1000  100000000\n",
       "four   2017   Shenzhen         700  100000000\n",
       "five   2016   Hangzhou         500  100000000\n",
       "six    2016   Chongqin         500  100000000"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#可以给一整列赋值\n",
    "frame2['debt'] = 100000000\n",
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2200</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>100000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>six</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year       city  population       debt\n",
       "one    2016    Beijing        2200  100000000\n",
       "two    2017   Shanghai        2300  100000000\n",
       "three  2016  Guangzhou        1000  100000000\n",
       "four   2017   Shenzhen         700  100000000\n",
       "five   2016   Hangzhou         500  100000000\n",
       "six       0          0           0          0"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.ix['six'] = 0\n",
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       year       city  population debt\n",
      "one    2016    Beijing        2100  NaN\n",
      "two    2017   Shanghai        2300  NaN\n",
      "three  2016  Guangzhou        1000  NaN\n",
      "four   2017   Shenzhen         700  NaN\n",
      "five   2016   Hangzhou         500  NaN\n",
      "six    2016   Chongqin         500  NaN\n"
     ]
    }
   ],
   "source": [
    "frame2 = pd.DataFrame(data, \\\n",
    "                     columns = ['year', 'city', 'population', 'debt'],\n",
    "                     index = ['one','two','three','four','five','six'])\n",
    "print(frame2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>city</th>\n",
       "      <th>population</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>2016</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>2100</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>2300</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td>2016</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>1000</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td>2017</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>700</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td>2016</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>500</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>six</th>\n",
       "      <td>2016</td>\n",
       "      <td>Chongqin</td>\n",
       "      <td>500</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year       city  population  debt\n",
       "one    2016    Beijing        2100     0\n",
       "two    2017   Shanghai        2300     1\n",
       "three  2016  Guangzhou        1000     2\n",
       "four   2017   Shenzhen         700     3\n",
       "five   2016   Hangzhou         500     4\n",
       "six    2016   Chongqin         500     5"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.debt = np.arange(6)\n",
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还可以用Series来指定需要修改的index以及相对应的value，没有指定的默认用NaN."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       year       city  population   debt\n",
      "one    2016    Beijing        2100    NaN\n",
      "two    2017   Shanghai        2300  200.0\n",
      "three  2016  Guangzhou        1000  300.0\n",
      "four   2017   Shenzhen         700    NaN\n",
      "five   2016   Hangzhou         500  500.0\n",
      "six    2016   Chongqin         500    NaN\n"
     ]
    }
   ],
   "source": [
    "val = pd.Series([200, 300, 500], index = ['two', 'three', 'five'])\n",
    "val\n",
    "frame2['debt'] = val\n",
    "print(frame2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       year       city  population   debt  western\n",
      "one    2016    Beijing        2100    NaN    False\n",
      "two    2017   Shanghai        2300  200.0    False\n",
      "three  2016  Guangzhou        1000  300.0    False\n",
      "four   2017   Shenzhen         700    NaN    False\n",
      "five   2016   Hangzhou         500  500.0    False\n",
      "six    2016   Chongqin         500    NaN    False\n"
     ]
    }
   ],
   "source": [
    "frame2['western'] = (frame2.city == 'Chongqing')\n",
    "print(frame2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'year', u'city', u'population', u'debt', u'western'], dtype='object')"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'one', u'two', u'three', u'four', u'five', u'six'], dtype='object')"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个DataFrame就和一个numpy 2d array一样，可以被转置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "      <th>five</th>\n",
       "      <th>six</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <td>2016</td>\n",
       "      <td>2017</td>\n",
       "      <td>2016</td>\n",
       "      <td>2017</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <td>Beijing</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>Hangzhou</td>\n",
       "      <td>Chongqin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>population</th>\n",
       "      <td>2100</td>\n",
       "      <td>2300</td>\n",
       "      <td>1000</td>\n",
       "      <td>700</td>\n",
       "      <td>500</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>debt</th>\n",
       "      <td>NaN</td>\n",
       "      <td>200</td>\n",
       "      <td>300</td>\n",
       "      <td>NaN</td>\n",
       "      <td>500</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>western</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                one       two      three      four      five       six\n",
       "year           2016      2017       2016      2017      2016      2016\n",
       "city        Beijing  Shanghai  Guangzhou  Shenzhen  Hangzhou  Chongqin\n",
       "population     2100      2300       1000       700       500       500\n",
       "debt            NaN       200        300       NaN       500       NaN\n",
       "western       False     False      False     False     False     False"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      Beijing  Shanghai\n",
      "2015      NaN      2400\n",
      "2016   2100.0      2500\n",
      "2017   2200.0      2600\n"
     ]
    }
   ],
   "source": [
    "pop = {'Beijing':{2016: 2100, 2017:2200},\n",
    "      'Shanghai':{2015: 2400, 2016:2500, 2017:2600}}\n",
    "frame3 = pd.DataFrame(pop)\n",
    "print(frame3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            2015    2016    2017\n",
      "Beijing      NaN  2100.0  2200.0\n",
      "Shanghai  2400.0  2500.0  2600.0\n"
     ]
    }
   ],
   "source": [
    "print(frame3.T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "指定index的顺序，以及使用切片初始化数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2016    2100.0\n",
       "2017    2200.0\n",
       "Name: Beijing, dtype: float64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame3['Beijing'][1:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2015    2400\n",
      "2016    2500\n",
      "Name: Shanghai, dtype: int64\n",
      "2015    2400\n",
      "Name: Shanghai, dtype: int64\n",
      "Series([], Name: Shanghai, dtype: int64)\n",
      "2015    2400\n",
      "2016    2500\n",
      "2017    2600\n",
      "Name: Shanghai, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print frame3['Shanghai'][:-1]\n",
    "print frame3['Shanghai'][:-2]\n",
    "print frame3['Shanghai'][:-3]\n",
    "print frame3['Shanghai']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      Beijing  Shanghai\n",
      "2015      NaN      2400\n",
      "2016   2100.0      2500\n"
     ]
    }
   ],
   "source": [
    "pdata = {'Beijing':frame3['Beijing'][:-1],'Shanghai':frame3['Shanghai'][:-1]}\n",
    "print(pd.DataFrame(pdata))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>city</th>\n",
       "      <th>Beijing</th>\n",
       "      <th>Shanghai</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>2100.0</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>2200.0</td>\n",
       "      <td>2600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "city  Beijing  Shanghai\n",
       "year                   \n",
       "2015      NaN      2400\n",
       "2016   2100.0      2500\n",
       "2017   2200.0      2600"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame3.index.name = 'year'\n",
    "frame3.columns.name = 'city'\n",
    "frame3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(frame3.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index\n",
    "\n",
    "\n",
    "### index object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index([u'a', u'b', u'c'], dtype='object')\n",
      "Index([u'b', u'c'], dtype='object')\n"
     ]
    }
   ],
   "source": [
    "obj = pd.Series(range(3), index=['a', 'b', 'c'])\n",
    "index = obj.index\n",
    "print(index)\n",
    "print(index[1:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "Index does not support mutable operations",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-55-a12c5cbc3e9e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m#index的值是不能被更改的\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mindex\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'd'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32mF:\\Program\\Anaconda2\\lib\\site-packages\\pandas\\core\\indexes\\base.pyc\u001b[0m in \u001b[0;36m__setitem__\u001b[1;34m(self, key, value)\u001b[0m\n\u001b[0;32m   2048\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2049\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__setitem__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2050\u001b[1;33m         \u001b[1;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"Index does not support mutable operations\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   2051\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2052\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__getitem__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mTypeError\u001b[0m: Index does not support mutable operations"
     ]
    }
   ],
   "source": [
    "#index的值是不能被更改的\n",
    "index[1] = 'd'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    2\n",
      "1    5\n",
      "2    7\n",
      "dtype: int64\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "index = pd.Index(np.arange(3))\n",
    "index\n",
    "obj2 = pd.Series([2, 5, 7], index=index)\n",
    "print(obj2)\n",
    "print(obj2.index is index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      Beijing  Shanghai\n",
      "2015      NaN      2400\n",
      "2016   2100.0      2500\n",
      "2017   2200.0      2600\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "pop = {'Beijing': {2016: 2100, 2017:2200},\n",
    "      'Shanghai': {2015:2400, 2016:2500, 2017:2600}}\n",
    "frame3 = pd.DataFrame(pop)\n",
    "print(frame3)\n",
    "print('Shanghai' in frame3.columns)\n",
    "print(2015 in frame3.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 针对index进行索引和切片"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a    0\n",
      "b    1\n",
      "c    2\n",
      "d    3\n",
      "dtype: int32\n"
     ]
    }
   ],
   "source": [
    "obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])\n",
    "print(obj)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    1\n",
       "a    0\n",
       "dtype: int32"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj[['b', 'a']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "c    2\n",
       "dtype: int32"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj[[0, 2]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 对Series进行切片"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    1\n",
       "c    2\n",
       "dtype: int32"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj[1:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a    0\n",
      "b    5\n",
      "c    5\n",
      "d    5\n",
      "dtype: int32\n"
     ]
    }
   ],
   "source": [
    "obj['b':'d'] = 5\n",
    "print(obj)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Hangzhou  Shenzhen  Nanjing\n",
      "a         0         1        2\n",
      "c         3         4        5\n",
      "d         6         7        8\n"
     ]
    }
   ],
   "source": [
    "#对DataFrame进行Indexing与Series基本相同\n",
    "frame = pd.DataFrame(np.arange(9).reshape(3, 3),\n",
    "                    index = ['a', 'c', 'd'],\n",
    "                    columns = ['Hangzhou', 'Shenzhen', 'Nanjing'])\n",
    "print(frame)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "c    3\n",
       "d    6\n",
       "Name: Hangzhou, dtype: int32"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame['Hangzhou']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "a         0         1        2\n",
       "c         3         4        5"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "a         0         1        2\n",
       "c         3         4        5\n",
       "d         6         7        8"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.ix['a':'d']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [Hangzhou, Shenzhen, Nanjing]\n",
       "Index: []"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame['Hangzhou':'Nanjing']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Shenzhen  Nanjing\n",
       "a         1        2\n",
       "c         4        5\n",
       "d         7        8"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.ix[:,'Shenzhen':'Nanjing']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "c    3\n",
       "Name: Hangzhou, dtype: int32"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.ix[:'c','Hangzhou']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "c         3         4        5\n",
       "d         6         7        8"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame[frame.Hangzhou > 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Hangzhou  Shenzhen  Nanjing\n",
      "a         0         0        0\n",
      "c         0         0        5\n",
      "d         6         7        8\n"
     ]
    }
   ],
   "source": [
    "frame[frame < 5] = 0\n",
    "print(frame)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [reindex](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html)\n",
    "\n",
    "把一个Series或者DataFrame按照新的index顺序进行重排"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "d    4.5\n",
      "b    7.2\n",
      "a   -5.3\n",
      "c    3.2\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "obj = pd.Series([4.5, 7.2, -5.3, 3.2], index=['d', 'b', 'a', 'c'])\n",
    "print(obj)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a   -5.3\n",
       "b    7.2\n",
       "c    3.2\n",
       "d    4.5\n",
       "e    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.reindex(['a', 'b', 'c', 'd', 'e'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a   -5.3\n",
       "b    7.2\n",
       "c    3.2\n",
       "d    4.5\n",
       "e    0.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0      blue\n",
      "2    purple\n",
      "4    yellow\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "obj3 = pd.Series(['blue', 'purple', 'yellow'], index = [0, 2, 4])\n",
    "print(obj3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      blue\n",
       "1      blue\n",
       "2    purple\n",
       "3    purple\n",
       "4    yellow\n",
       "5    yellow\n",
       "dtype: object"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj3.reindex(range(6), method = 'ffill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      blue\n",
       "1    purple\n",
       "2    purple\n",
       "3    yellow\n",
       "4    yellow\n",
       "5       NaN\n",
       "dtype: object"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj3.reindex(range(6),method = 'bfill')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "既然我们可以对Series进行reindex，相应地，我们也可以用同样的方法对DataFrame进行reindex。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Hangzhou  Shenzhen  Nanjing\n",
      "a         0         1        2\n",
      "c         3         4        5\n",
      "d         6         7        8\n"
     ]
    }
   ],
   "source": [
    "frame = pd.DataFrame(np.arange(9).reshape(3,3),\n",
    "                    index = ['a', 'c', 'd'],\n",
    "                    columns = ['Hangzhou','Shenzhen','Nanjing'])\n",
    "print(frame)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "a       0.0       1.0      2.0\n",
       "b       NaN       NaN      NaN\n",
       "c       3.0       4.0      5.0\n",
       "d       6.0       7.0      8.0"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.reindex(['a', 'b', 'c', 'd'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Chongqing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>7</td>\n",
       "      <td>6</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Shenzhen  Hangzhou  Chongqing\n",
       "a         1         0        NaN\n",
       "c         4         3        NaN\n",
       "d         7         6        NaN"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#在reindex的同时，我们还可以重新指定columns\n",
    "frame.reindex(columns = ['Shenzhen', 'Hangzhou', 'Chongqing'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Chongqing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>4.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>7.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Shenzhen  Hangzhou  Chongqing\n",
       "a       1.0       0.0        NaN\n",
       "b       1.0       0.0        NaN\n",
       "c       4.0       3.0        NaN\n",
       "d       7.0       6.0        NaN"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# method 放在里面会报错 index must be monotonic increasing or decreasing\n",
    "frame.reindex(index = ['a', 'b', 'c', 'd'],\n",
    "             #method = 'ffill',\n",
    "             columns = ['Shenzhen', 'Hangzhou', 'Chongqing']).ffill()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0      blue\n",
      "4    yellow\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "#下面介绍如何用drop来删除Series和DataFrame中的index\n",
    "obj4 = obj3.drop(2)\n",
    "print(obj4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    blue\n",
       "dtype: object"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj3.drop([2, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "a         0         1        2\n",
       "c         3         4        5\n",
       "d         6         7        8"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Shenzhen</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Shenzhen  Nanjing\n",
       "d         6         7        8"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.drop(['a','c'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Hangzhou</th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>6</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Hangzhou  Nanjing\n",
       "a         0        2\n",
       "c         3        5\n",
       "d         6        8"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.drop('Shenzhen',axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nanjing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Nanjing\n",
       "a        2\n",
       "c        5\n",
       "d        8"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#drop不仅仅可以删除行，还可以删除列\n",
    "frame.drop(['Shenzhen','Hangzhou'],axis = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### hierarchical index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a  1   -0.858385\n",
       "   2    0.580195\n",
       "   3   -0.312629\n",
       "b  1    0.487475\n",
       "   2    1.203145\n",
       "c  1   -0.144067\n",
       "   2    1.643179\n",
       "   3    0.525071\n",
       "d  1   -0.445105\n",
       "   2    1.049314\n",
       "dtype: float64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series(np.random.randn(10),index = [['a','a','a','b','b','c','c','c','d','d'],\\\n",
    "                                             [1, 2, 3, 1, 2, 1, 2, 3, 1, 2]])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]],\n",
       "           labels=[[0, 0, 0, 1, 1, 2, 2, 2, 3, 3], [0, 1, 2, 0, 1, 0, 1, 2, 0, 1]])"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1    0.487475\n",
      "2    1.203145\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(data[\"b\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b  1    0.487475\n",
       "   2    1.203145\n",
       "c  1   -0.144067\n",
       "   2    1.643179\n",
       "   3    0.525071\n",
       "d  1   -0.445105\n",
       "   2    1.049314\n",
       "dtype: float64"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['b':'d']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a  2    0.580195\n",
       "   3   -0.312629\n",
       "b  1    0.487475\n",
       "dtype: float64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[1:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "unstack和stack可以帮助我们在hierarchical indexing和DataFrame之间进行切换"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>-0.858385</td>\n",
       "      <td>0.580195</td>\n",
       "      <td>-0.312629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>0.487475</td>\n",
       "      <td>1.203145</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>-0.144067</td>\n",
       "      <td>1.643179</td>\n",
       "      <td>0.525071</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>-0.445105</td>\n",
       "      <td>1.049314</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1         2         3\n",
       "a -0.858385  0.580195 -0.312629\n",
       "b  0.487475  1.203145       NaN\n",
       "c -0.144067  1.643179  0.525071\n",
       "d -0.445105  1.049314       NaN"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.unstack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data.unstack())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a  1   -0.858385\n",
       "   2    0.580195\n",
       "   3   -0.312629\n",
       "b  1    0.487475\n",
       "   2    1.203145\n",
       "c  1   -0.144067\n",
       "   2    1.643179\n",
       "   3    0.525071\n",
       "d  1   -0.445105\n",
       "   2    1.049314\n",
       "dtype: float64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.unstack().stack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Beijing      Shanghai\n",
      "       apts cars     apts\n",
      "a 1       0    1        2\n",
      "  2       3    4        5\n",
      "b 1       6    7        8\n",
      "  2       9   10       11\n"
     ]
    }
   ],
   "source": [
    "#DataFrame的hierarchical indexing\n",
    "frame = pd.DataFrame(np.arange(12).reshape((4,3)),\n",
    "                     index = [['a','a','b','b'],[1, 2, 1, 2]],\n",
    "                    columns = [['Beijing','Beijing','Shanghai'],['apts','cars','apts']])\n",
    "print(frame)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "city         Beijing      Shanghai\n",
      "type            apts cars     apts\n",
      "alpha number                      \n",
      "a     1            0    1        2\n",
      "      2            3    4        5\n",
      "b     1            6    7        8\n",
      "      2            9   10       11\n"
     ]
    }
   ],
   "source": [
    "frame.index.names = ['alpha', 'number']\n",
    "frame.columns.names = ['city','type']\n",
    "print(frame)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "city      type\n",
       "Beijing   apts    0\n",
       "          cars    1\n",
       "Shanghai  apts    2\n",
       "Name: (a, 1), dtype: int32"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.ix['a', 1]\n",
    "#print(type(frame.ix['a', 1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.ix['a', 2]['Beijing']['apts']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 关于Merge, Join和Concatenate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars\n",
       "Shanghai  55000  200000\n",
       "Beijing   60000  300000"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.DataFrame({'apts': [55000, 60000], 'cars': [200000, 300000],}, index = ['Shanghai', 'Beijing'])\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "           apts    cars\n",
      "Hangzhou  25000  150000\n",
      "Nanjing   20000  120000\n"
     ]
    }
   ],
   "source": [
    "df2 = pd.DataFrame({'apts': [25000, 20000],'cars': [150000, 120000],}, index = ['Hangzhou', 'Nanjing'])\n",
    "print(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            apts    cars\n",
      "Guangzhou  30000  190000\n",
      "Chongqing  10000  100000\n"
     ]
    }
   ],
   "source": [
    "df3 = pd.DataFrame({'apts': [30000, 10000],'cars': [190000, 100000],}, index = ['Guangzhou', 'Chongqing'])\n",
    "print(df3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hangzhou</th>\n",
       "      <td>25000</td>\n",
       "      <td>150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nanjing</th>\n",
       "      <td>20000</td>\n",
       "      <td>120000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>30000</td>\n",
       "      <td>190000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chongqing</th>\n",
       "      <td>10000</td>\n",
       "      <td>100000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            apts    cars\n",
       "Shanghai   55000  200000\n",
       "Beijing    60000  300000\n",
       "Hangzhou   25000  150000\n",
       "Nanjing    20000  120000\n",
       "Guangzhou  30000  190000\n",
       "Chongqing  10000  100000"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frames = [df1, df2, df3]\n",
    "result = pd.concat(frames)\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在concatenate的时候可以指定keys，这样可以给每一个部分加上一个Key。\n",
    "\n",
    "以下的例子就构造了一个hierarchical index。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              apts    cars\n",
      "x Shanghai   55000  200000\n",
      "  Beijing    60000  300000\n",
      "y Hangzhou   25000  150000\n",
      "  Nanjing    20000  120000\n",
      "z Guangzhou  30000  190000\n",
      "  Chongqing  10000  100000\n"
     ]
    }
   ],
   "source": [
    "result2 = pd.concat(frames, keys = ['x', 'y', 'z'])\n",
    "print(result2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>y</th>\n",
       "      <th>Nanjing</th>\n",
       "      <td>20000</td>\n",
       "      <td>120000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">z</th>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>30000</td>\n",
       "      <td>190000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chongqing</th>\n",
       "      <td>10000</td>\n",
       "      <td>100000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              apts    cars\n",
       "y Nanjing    20000  120000\n",
       "z Guangzhou  30000  190000\n",
       "  Chongqing  10000  100000"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result2.ix[3: 6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "           salaries\n",
      "Suzhou        10000\n",
      "Beijing       30000\n",
      "Shanghai      30000\n",
      "Guangzhou     20000\n",
      "Tianjin       15000\n"
     ]
    }
   ],
   "source": [
    "df4 = pd.DataFrame({'salaries': [10000, 30000, 30000, 20000, 15000]}, index = ['Suzhou', 'Beijing', 'Shanghai', 'Guangzhou', 'Tianjin'])\n",
    "print(df4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n",
      "of pandas will change to not sort by default.\n",
      "\n",
      "To accept the future behavior, pass 'sort=True'.\n",
      "\n",
      "To retain the current behavior and silence the warning, pass sort=False\n",
      "\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chongqing</th>\n",
       "      <td>10000.0</td>\n",
       "      <td>100000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>30000.0</td>\n",
       "      <td>190000.0</td>\n",
       "      <td>20000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hangzhou</th>\n",
       "      <td>25000.0</td>\n",
       "      <td>150000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nanjing</th>\n",
       "      <td>20000.0</td>\n",
       "      <td>120000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Suzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tianjin</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              apts      cars  salaries\n",
       "Beijing    60000.0  300000.0   30000.0\n",
       "Chongqing  10000.0  100000.0       NaN\n",
       "Guangzhou  30000.0  190000.0   20000.0\n",
       "Hangzhou   25000.0  150000.0       NaN\n",
       "Nanjing    20000.0  120000.0       NaN\n",
       "Shanghai   55000.0  200000.0   30000.0\n",
       "Suzhou         NaN       NaN   10000.0\n",
       "Tianjin        NaN       NaN   15000.0"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result3 = pd.concat([result, df4], axis = 1)\n",
    "result3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Beijing    apts         60000.0\n",
       "           cars        300000.0\n",
       "           salaries     30000.0\n",
       "Chongqing  apts         10000.0\n",
       "           cars        100000.0\n",
       "Guangzhou  apts         30000.0\n",
       "           cars        190000.0\n",
       "           salaries     20000.0\n",
       "Hangzhou   apts         25000.0\n",
       "           cars        150000.0\n",
       "Nanjing    apts         20000.0\n",
       "           cars        120000.0\n",
       "Shanghai   apts         55000.0\n",
       "           cars        200000.0\n",
       "           salaries     30000.0\n",
       "Suzhou     salaries     10000.0\n",
       "Tianjin    salaries     15000.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result3.stack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "      <td>30000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "      <td>30000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>30000</td>\n",
       "      <td>190000</td>\n",
       "      <td>20000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            apts    cars  salaries\n",
       "Shanghai   55000  200000     30000\n",
       "Beijing    60000  300000     30000\n",
       "Guangzhou  30000  190000     20000"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result3 = pd.concat([result, df4], axis = 1, join = 'inner')\n",
    "result3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hangzhou</th>\n",
       "      <td>25000</td>\n",
       "      <td>150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nanjing</th>\n",
       "      <td>20000</td>\n",
       "      <td>120000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars\n",
       "Shanghai  55000  200000\n",
       "Beijing   60000  300000\n",
       "Hangzhou  25000  150000\n",
       "Nanjing   20000  120000"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#用append来做concatenation\n",
    "df1.append(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\pandas\\core\\frame.py:6201: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n",
      "of pandas will change to not sort by default.\n",
      "\n",
      "To accept the future behavior, pass 'sort=True'.\n",
      "\n",
      "To retain the current behavior and silence the warning, pass sort=False\n",
      "\n",
      "  sort=sort)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Suzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tianjin</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              apts      cars  salaries\n",
       "Shanghai   55000.0  200000.0       NaN\n",
       "Beijing    60000.0  300000.0       NaN\n",
       "Suzhou         NaN       NaN   10000.0\n",
       "Beijing        NaN       NaN   30000.0\n",
       "Shanghai       NaN       NaN   30000.0\n",
       "Guangzhou      NaN       NaN   20000.0\n",
       "Tianjin        NaN       NaN   15000.0"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.append(df4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shanghai    60\n",
      "Beijing     50\n",
      "Name: meal, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "#Series和DataFrame还可以被一起concatenate，这时候Series会先被转成DataFrame然后做Join，因为Series本来就是一个只有一维的DataFrame对吧。\n",
    "s1 = pd.Series([60, 50], index=['Shanghai','Beijing'], name = 'meal')\n",
    "print(s1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars\n",
       "Shanghai  55000  200000\n",
       "Beijing   60000  300000"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>meal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars  meal\n",
       "Shanghai  55000  200000    60\n",
       "Beijing   60000  300000    50"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, s1], axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "apts     18000\n",
       "cars    120000\n",
       "Name: Xiamen, dtype: int64"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2 = pd.Series([18000, 120000], index = ['apts', 'cars'], name = 'Xiamen')\n",
    "s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Xiamen</th>\n",
       "      <td>18000</td>\n",
       "      <td>120000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars\n",
       "Shanghai  55000  200000\n",
       "Beijing   60000  300000\n",
       "Xiamen    18000  120000"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.append(s2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Merge(Join)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    apts    cars    cities\n",
      "0  55000  200000  Shanghai\n",
      "1  60000  300000   Beijing\n",
      "2  58000  250000  Shenzhen\n"
     ]
    }
   ],
   "source": [
    "df1 = pd.DataFrame({'apts': [55000, 60000, 58000],'cars': [200000, 300000, 250000],'cities':['Shanghai','Beijing','Shenzhen']})\n",
    "print(df1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      cities  salaries\n",
      "0     Suzhou     10000\n",
      "1    Beijing     30000\n",
      "2   Shanghai     30000\n",
      "3  Guangzhou     20000\n",
      "4    Tianjin     15000\n"
     ]
    }
   ],
   "source": [
    "df4 = pd.DataFrame({'salaries': [10000, 30000, 30000, 20000, 15000], 'cities': ['Suzhou', 'Beijing', 'Shanghai', 'Guangzhou', 'Tianjin']})\n",
    "print(df4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>cities</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>30000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>30000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    apts    cars    cities  salaries\n",
       "0  55000  200000  Shanghai     30000\n",
       "1  60000  300000   Beijing     30000"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = pd.merge(df1, df4, on = 'cities')\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>cities</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "      <td>Shanghai</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>Beijing</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>58000.0</td>\n",
       "      <td>250000.0</td>\n",
       "      <td>Shenzhen</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Suzhou</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Guangzhou</td>\n",
       "      <td>20000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Tianjin</td>\n",
       "      <td>15000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      apts      cars     cities  salaries\n",
       "0  55000.0  200000.0   Shanghai   30000.0\n",
       "1  60000.0  300000.0    Beijing   30000.0\n",
       "2  58000.0  250000.0   Shenzhen       NaN\n",
       "3      NaN       NaN     Suzhou   10000.0\n",
       "4      NaN       NaN  Guangzhou   20000.0\n",
       "5      NaN       NaN    Tianjin   15000.0"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result2 = pd.merge(df1, df4, on = 'cities', how = 'outer')\n",
    "result2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "           apts    cars\n",
      "Shanghai  55000  200000\n",
      "Beijing   60000  300000\n",
      "Shenzhen  58000  250000\n"
     ]
    }
   ],
   "source": [
    "# join on index\n",
    "df1 = pd.DataFrame({'apts':[55000, 60000, 58000], 'cars': [200000, 300000, 250000]}, index = ['Shanghai', 'Beijing', 'Shenzhen'])\n",
    "print(df1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "           salaries\n",
      "Suzhou        10000\n",
      "Beijing       30000\n",
      "Shanghai      30000\n",
      "Guangzhou     20000\n",
      "Tianjin       15000\n"
     ]
    }
   ],
   "source": [
    "df4 = pd.DataFrame({'salaries':[10000, 30000, 30000, 20000, 15000]}, index = ['Suzhou', 'Beijing', 'Shanghai', 'Guangzhou', 'Tianjin'])\n",
    "print(df4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000</td>\n",
       "      <td>200000</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000</td>\n",
       "      <td>300000</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shenzhen</th>\n",
       "      <td>58000</td>\n",
       "      <td>250000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           apts    cars  salaries\n",
       "Shanghai  55000  200000   30000.0\n",
       "Beijing   60000  300000   30000.0\n",
       "Shenzhen  58000  250000       NaN"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.join(df4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>salaries</th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Suzhou</th>\n",
       "      <td>10000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>30000</td>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>30000</td>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>20000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tianjin</th>\n",
       "      <td>15000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           salaries     apts      cars\n",
       "Suzhou        10000      NaN       NaN\n",
       "Beijing       30000  60000.0  300000.0\n",
       "Shanghai      30000  55000.0  200000.0\n",
       "Guangzhou     20000      NaN       NaN\n",
       "Tianjin       15000      NaN       NaN"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.join(df1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shenzhen</th>\n",
       "      <td>58000.0</td>\n",
       "      <td>250000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Suzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tianjin</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              apts      cars  salaries\n",
       "Beijing    60000.0  300000.0   30000.0\n",
       "Guangzhou      NaN       NaN   20000.0\n",
       "Shanghai   55000.0  200000.0   30000.0\n",
       "Shenzhen   58000.0  250000.0       NaN\n",
       "Suzhou         NaN       NaN   10000.0\n",
       "Tianjin        NaN       NaN   15000.0"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.join(df4, how='outer')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>apts</th>\n",
       "      <th>cars</th>\n",
       "      <th>salaries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Beijing</th>\n",
       "      <td>60000.0</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guangzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shanghai</th>\n",
       "      <td>55000.0</td>\n",
       "      <td>200000.0</td>\n",
       "      <td>30000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shenzhen</th>\n",
       "      <td>58000.0</td>\n",
       "      <td>250000.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Suzhou</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tianjin</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              apts      cars  salaries\n",
       "Beijing    60000.0  300000.0   30000.0\n",
       "Guangzhou      NaN       NaN   20000.0\n",
       "Shanghai   55000.0  200000.0   30000.0\n",
       "Shenzhen   58000.0  250000.0       NaN\n",
       "Suzhou         NaN       NaN   10000.0\n",
       "Tianjin        NaN       NaN   15000.0"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#也可以用merge来写\n",
    "pd.merge(df1, df4, left_index = True,right_index = True,how = 'outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Group By"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Bonus  Name  Salary  Year\n",
      "0   3000  July   10000  2016\n",
      "1   1000   Chu    2000  2016\n",
      "2   1000   Chu    4000  2016\n",
      "3   1200   Lin    5000  2016\n",
      "4   4000  July   18000  2017\n",
      "5   2300  July   25000  2017\n",
      "6    500   Chu    3000  2017\n",
      "7   1000  July    4000  2017\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "salaries = pd.DataFrame({\n",
    "    'Name':['July', 'Chu', 'Chu', 'Lin', 'July', 'July', 'Chu', 'July'],\n",
    "    'Year':[2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017],\n",
    "    'Salary': [10000,2000,4000,5000,18000,25000,3000,4000],\n",
    "    'Bonus': [3000,1000,1000,1200,4000,2300,500,1000]\n",
    "})\n",
    "print(salaries)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x00000000090E7EB8>"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name = salaries.groupby('Name')\n",
    "group_by_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### groupby经常和aggregate一起使用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bonus</th>\n",
       "      <th>Salary</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Chu</th>\n",
       "      <td>2500</td>\n",
       "      <td>9000</td>\n",
       "      <td>6049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>10300</td>\n",
       "      <td>57000</td>\n",
       "      <td>8067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lin</th>\n",
       "      <td>1200</td>\n",
       "      <td>5000</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Bonus  Salary  Year\n",
       "Name                     \n",
       "Chu    2500    9000  6049\n",
       "July  10300   57000  8067\n",
       "Lin    1200    5000  2016"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name.aggregate(sum)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bonus</th>\n",
       "      <th>Salary</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Chu</th>\n",
       "      <td>2500</td>\n",
       "      <td>9000</td>\n",
       "      <td>6049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>10300</td>\n",
       "      <td>57000</td>\n",
       "      <td>8067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lin</th>\n",
       "      <td>1200</td>\n",
       "      <td>5000</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Bonus  Salary  Year\n",
       "Name                     \n",
       "Chu    2500    9000  6049\n",
       "July  10300   57000  8067\n",
       "Lin    1200    5000  2016"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Bonus</th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Chu</th>\n",
       "      <th>2016</th>\n",
       "      <td>2000</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>500</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">July</th>\n",
       "      <th>2016</th>\n",
       "      <td>3000</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>7300</td>\n",
       "      <td>47000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lin</th>\n",
       "      <th>2016</th>\n",
       "      <td>1200</td>\n",
       "      <td>5000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Bonus  Salary\n",
       "Name Year               \n",
       "Chu  2016   2000    6000\n",
       "     2017    500    3000\n",
       "July 2016   3000   10000\n",
       "     2017   7300   47000\n",
       "Lin  2016   1200    5000"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#也可以写成\n",
    "group_by_name_year = salaries.groupby(['Name', 'Year'])\n",
    "group_by_name_year.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name  Year\n",
       "Chu   2016    2\n",
       "      2017    1\n",
       "July  2016    1\n",
       "      2017    3\n",
       "Lin   2016    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name_year.size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Bonus</th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Chu</th>\n",
       "      <th>2016</th>\n",
       "      <td>1000</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>500</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">July</th>\n",
       "      <th>2016</th>\n",
       "      <td>3000</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>4000</td>\n",
       "      <td>25000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lin</th>\n",
       "      <th>2016</th>\n",
       "      <td>1200</td>\n",
       "      <td>5000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Bonus  Salary\n",
       "Name Year               \n",
       "Chu  2016   1000    4000\n",
       "     2017    500    3000\n",
       "July 2016   3000   10000\n",
       "     2017   4000   25000\n",
       "Lin  2016   1200    5000"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name_year.max() #min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">Bonus</th>\n",
       "      <th colspan=\"8\" halign=\"left\">Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Chu</th>\n",
       "      <th>2016</th>\n",
       "      <td>2.0</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3000.000000</td>\n",
       "      <td>1414.213562</td>\n",
       "      <td>2000.0</td>\n",
       "      <td>2500.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>4000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>1.0</td>\n",
       "      <td>500.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>500.0</td>\n",
       "      <td>500.0</td>\n",
       "      <td>500.0</td>\n",
       "      <td>500.0</td>\n",
       "      <td>500.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3000.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">July</th>\n",
       "      <th>2016</th>\n",
       "      <td>1.0</td>\n",
       "      <td>3000.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>10000.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>3.0</td>\n",
       "      <td>2433.333333</td>\n",
       "      <td>1504.43788</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>1650.0</td>\n",
       "      <td>2300.0</td>\n",
       "      <td>3150.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>15666.666667</td>\n",
       "      <td>10692.676622</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>11000.0</td>\n",
       "      <td>18000.0</td>\n",
       "      <td>21500.0</td>\n",
       "      <td>25000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lin</th>\n",
       "      <th>2016</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1200.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5000.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>5000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Bonus                                                           \\\n",
       "          count         mean         std     min     25%     50%     75%   \n",
       "Name Year                                                                  \n",
       "Chu  2016   2.0  1000.000000     0.00000  1000.0  1000.0  1000.0  1000.0   \n",
       "     2017   1.0   500.000000         NaN   500.0   500.0   500.0   500.0   \n",
       "July 2016   1.0  3000.000000         NaN  3000.0  3000.0  3000.0  3000.0   \n",
       "     2017   3.0  2433.333333  1504.43788  1000.0  1650.0  2300.0  3150.0   \n",
       "Lin  2016   1.0  1200.000000         NaN  1200.0  1200.0  1200.0  1200.0   \n",
       "\n",
       "                  Salary                                                \\\n",
       "              max  count          mean           std      min      25%   \n",
       "Name Year                                                                \n",
       "Chu  2016  1000.0    2.0   3000.000000   1414.213562   2000.0   2500.0   \n",
       "     2017   500.0    1.0   3000.000000           NaN   3000.0   3000.0   \n",
       "July 2016  3000.0    1.0  10000.000000           NaN  10000.0  10000.0   \n",
       "     2017  4000.0    3.0  15666.666667  10692.676622   4000.0  11000.0   \n",
       "Lin  2016  1200.0    1.0   5000.000000           NaN   5000.0   5000.0   \n",
       "\n",
       "                                      \n",
       "               50%      75%      max  \n",
       "Name Year                             \n",
       "Chu  2016   3000.0   3500.0   4000.0  \n",
       "     2017   3000.0   3000.0   3000.0  \n",
       "July 2016  10000.0  10000.0  10000.0  \n",
       "     2017  18000.0  21500.0  25000.0  \n",
       "Lin  2016   5000.0   5000.0   5000.0  "
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group_by_name_year.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### WRITE TO CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Brébeuf (données non disponibles)</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "      <th>St-Urbain (données non disponibles)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1 Brébeuf (données non disponibles)  Côte-Sainte-Catherine  \\\n",
       "Date                                                                           \n",
       "2012-01-01       35                               NaN                      0   \n",
       "2012-01-02       83                               NaN                      1   \n",
       "2012-01-03      135                               NaN                      2   \n",
       "2012-01-04      144                               NaN                      1   \n",
       "2012-01-05      197                               NaN                      2   \n",
       "\n",
       "            Maisonneuve 1  Maisonneuve 2  du Parc  Pierre-Dupuy  Rachel1  \\\n",
       "Date                                                                       \n",
       "2012-01-01             38             51       26            10       16   \n",
       "2012-01-02             68            153       53             6       43   \n",
       "2012-01-03            104            248       89             3       58   \n",
       "2012-01-04            116            318      111             8       61   \n",
       "2012-01-05            124            330       97            13       95   \n",
       "\n",
       "           St-Urbain (données non disponibles)  \n",
       "Date                                            \n",
       "2012-01-01                                 NaN  \n",
       "2012-01-02                                 NaN  \n",
       "2012-01-03                                 NaN  \n",
       "2012-01-04                                 NaN  \n",
       "2012-01-05                                 NaN  "
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes = pd.DataFrame({'Berri 1': [35, 83, 135, 144, 197],\n",
    "                      'Côte-Sainte-Catherine': [0, 1, 2, 1, 2],\n",
    "                     'Maisonneuve 1': [38, 68, 104, 116, 124],\n",
    "                     'Maisonneuve 2': [51, 153, 248, 318, 330],\n",
    "                     'du Parc': [26, 53, 89, 111, 97],\n",
    "                     'Pierre-Dupuy': [10, 6, 3, 8, 13],\n",
    "                     'Rachel1': [16, 43, 58, 61, 95]},\n",
    "                     index = ['2012-01-01','2012-01-02','2012-01-03','2012-01-04','2012-01-05'],\n",
    "                     columns = ['Berri 1','Brébeuf (données non disponibles)','Côte-Sainte-Catherine','Maisonneuve 1','Maisonneuve 2','du Parc','Pierre-Dupuy','Rachel1','St-Urbain (données non disponibles)'],\n",
    "                    )\n",
    "bikes.index.name = 'Date'\n",
    "bikes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [],
   "source": [
    "bikes.to_csv('./3/bikes.csv', encoding='latin1', sep=';')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Brébeuf (données non disponibles)</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "      <th>St-Urbain (données non disponibles)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  Brébeuf (données non disponibles)  Côte-Sainte-Catherine  \\\n",
       "Date                                                                            \n",
       "2012-01-01       35                                NaN                      0   \n",
       "2012-01-02       83                                NaN                      1   \n",
       "2012-01-03      135                                NaN                      2   \n",
       "2012-01-04      144                                NaN                      1   \n",
       "2012-01-05      197                                NaN                      2   \n",
       "\n",
       "            Maisonneuve 1  Maisonneuve 2  du Parc  Pierre-Dupuy  Rachel1  \\\n",
       "Date                                                                       \n",
       "2012-01-01             38             51       26            10       16   \n",
       "2012-01-02             68            153       53             6       43   \n",
       "2012-01-03            104            248       89             3       58   \n",
       "2012-01-04            116            318      111             8       61   \n",
       "2012-01-05            124            330       97            13       95   \n",
       "\n",
       "            St-Urbain (données non disponibles)  \n",
       "Date                                             \n",
       "2012-01-01                                  NaN  \n",
       "2012-01-02                                  NaN  \n",
       "2012-01-03                                  NaN  \n",
       "2012-01-04                                  NaN  \n",
       "2012-01-05                                  NaN  "
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes = pd.read_csv('./3/bikes.csv', encoding='latin1', sep=';', \n",
    "                    parse_dates=['Date'], dayfirst=True, index_col='Date')\n",
    "bikes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Brébeuf (données non disponibles)</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "      <th>St-Urbain (données non disponibles)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [Berri 1, Brébeuf (données non disponibles), Côte-Sainte-Catherine, Maisonneuve 1, Maisonneuve 2, du Parc, Pierre-Dupuy, Rachel1, St-Urbain (données non disponibles)]\n",
       "Index: []"
      ]
     },
     "execution_count": 142,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Brébeuf (données non disponibles)</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "      <th>St-Urbain (données non disponibles)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  Brébeuf (données non disponibles)  Côte-Sainte-Catherine  \\\n",
       "Date                                                                            \n",
       "2012-01-01       35                                NaN                      0   \n",
       "2012-01-02       83                                NaN                      1   \n",
       "2012-01-03      135                                NaN                      2   \n",
       "2012-01-04      144                                NaN                      1   \n",
       "2012-01-05      197                                NaN                      2   \n",
       "\n",
       "            Maisonneuve 1  Maisonneuve 2  du Parc  Pierre-Dupuy  Rachel1  \\\n",
       "Date                                                                       \n",
       "2012-01-01             38             51       26            10       16   \n",
       "2012-01-02             68            153       53             6       43   \n",
       "2012-01-03            104            248       89             3       58   \n",
       "2012-01-04            116            318      111             8       61   \n",
       "2012-01-05            124            330       97            13       95   \n",
       "\n",
       "            St-Urbain (données non disponibles)  \n",
       "Date                                             \n",
       "2012-01-01                                  NaN  \n",
       "2012-01-02                                  NaN  \n",
       "2012-01-03                                  NaN  \n",
       "2012-01-04                                  NaN  \n",
       "2012-01-05                                  NaN  "
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes.dropna(how='all').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  Côte-Sainte-Catherine  Maisonneuve 1  Maisonneuve 2  \\\n",
       "Date                                                                       \n",
       "2012-01-01       35                      0             38             51   \n",
       "2012-01-02       83                      1             68            153   \n",
       "2012-01-03      135                      2            104            248   \n",
       "2012-01-04      144                      1            116            318   \n",
       "2012-01-05      197                      2            124            330   \n",
       "\n",
       "            du Parc  Pierre-Dupuy  Rachel1  \n",
       "Date                                        \n",
       "2012-01-01       26            10       16  \n",
       "2012-01-02       53             6       43  \n",
       "2012-01-03       89             3       58  \n",
       "2012-01-04      111             8       61  \n",
       "2012-01-05       97            13       95  "
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes.dropna(how='all',axis=1).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1\n",
       "Date               \n",
       "2012-01-01       35\n",
       "2012-01-02       83\n",
       "2012-01-03      135\n",
       "2012-01-04      144\n",
       "2012-01-05      197"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "berri_bikes = bikes[['Berri 1']].copy()\n",
    "berri_bikes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([6, 0, 1, 2, 3], dtype='int64', name=u'Date')"
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "berri_bikes.index.weekday"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>weekday</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  weekday\n",
       "Date                        \n",
       "2012-01-01       35        6\n",
       "2012-01-02       83        0\n",
       "2012-01-03      135        1\n",
       "2012-01-04      144        2\n",
       "2012-01-05      197        3"
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "berri_bikes.ix[:,'weekday'] = berri_bikes.index.weekday\n",
    "berri_bikes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weekday</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Berri 1\n",
       "weekday         \n",
       "0             83\n",
       "1            135\n",
       "2            144\n",
       "3            197\n",
       "6             35"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weekday_counts = berri_bikes.groupby('weekday').sum()\n",
    "weekday_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Brébeuf (données non disponibles)</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "      <th>St-Urbain (données non disponibles)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  Brébeuf (données non disponibles)  Côte-Sainte-Catherine  \\\n",
       "Date                                                                            \n",
       "2012-01-01       35                                NaN                      0   \n",
       "2012-01-02       83                                NaN                      1   \n",
       "2012-01-03      135                                NaN                      2   \n",
       "2012-01-04      144                                NaN                      1   \n",
       "2012-01-05      197                                NaN                      2   \n",
       "\n",
       "            Maisonneuve 1  Maisonneuve 2  du Parc  Pierre-Dupuy  Rachel1  \\\n",
       "Date                                                                       \n",
       "2012-01-01             38             51       26            10       16   \n",
       "2012-01-02             68            153       53             6       43   \n",
       "2012-01-03            104            248       89             3       58   \n",
       "2012-01-04            116            318      111             8       61   \n",
       "2012-01-05            124            330       97            13       95   \n",
       "\n",
       "            St-Urbain (données non disponibles)  \n",
       "Date                                             \n",
       "2012-01-01                                  NaN  \n",
       "2012-01-02                                  NaN  \n",
       "2012-01-03                                  NaN  \n",
       "2012-01-04                                  NaN  \n",
       "2012-01-05                                  NaN  "
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#有了weekday信息之后，我们就可以用上我们前面学过的.groupyby把骑车人数按照weekday分类，然后用aggregate算出每个工作日的骑车人数之和。\n",
    "bikes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Berri 1</th>\n",
       "      <th>Côte-Sainte-Catherine</th>\n",
       "      <th>Maisonneuve 1</th>\n",
       "      <th>Maisonneuve 2</th>\n",
       "      <th>du Parc</th>\n",
       "      <th>Pierre-Dupuy</th>\n",
       "      <th>Rachel1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "      <td>51</td>\n",
       "      <td>26</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>83</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>153</td>\n",
       "      <td>53</td>\n",
       "      <td>6</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>135</td>\n",
       "      <td>2</td>\n",
       "      <td>104</td>\n",
       "      <td>248</td>\n",
       "      <td>89</td>\n",
       "      <td>3</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>144</td>\n",
       "      <td>1</td>\n",
       "      <td>116</td>\n",
       "      <td>318</td>\n",
       "      <td>111</td>\n",
       "      <td>8</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>197</td>\n",
       "      <td>2</td>\n",
       "      <td>124</td>\n",
       "      <td>330</td>\n",
       "      <td>97</td>\n",
       "      <td>13</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Berri 1  Côte-Sainte-Catherine  Maisonneuve 1  Maisonneuve 2  \\\n",
       "Date                                                                       \n",
       "2012-01-01       35                      0             38             51   \n",
       "2012-01-02       83                      1             68            153   \n",
       "2012-01-03      135                      2            104            248   \n",
       "2012-01-04      144                      1            116            318   \n",
       "2012-01-05      197                      2            124            330   \n",
       "\n",
       "            du Parc  Pierre-Dupuy  Rachel1  \n",
       "Date                                        \n",
       "2012-01-01       26            10       16  \n",
       "2012-01-02       53             6       43  \n",
       "2012-01-03       89             3       58  \n",
       "2012-01-04      111             8       61  \n",
       "2012-01-05       97            13       95  "
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes = bikes.dropna(axis = 1,how = 'all')\n",
    "bikes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>858</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              0\n",
       "Date           \n",
       "2012-01-01  176\n",
       "2012-01-02  407\n",
       "2012-01-03  639\n",
       "2012-01-04  759\n",
       "2012-01-05  858"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes_sum = bikes.sum(axis = 1).to_frame()\n",
    "bikes_sum.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "F:\\Program\\Anaconda2\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n",
      ".ix is deprecated. Please use\n",
      ".loc for label based indexing or\n",
      ".iloc for positional indexing\n",
      "\n",
      "See the documentation here:\n",
      "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>weekday</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>176</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-02</th>\n",
       "      <td>407</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-03</th>\n",
       "      <td>639</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-04</th>\n",
       "      <td>759</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-05</th>\n",
       "      <td>858</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              0  weekday\n",
       "Date                    \n",
       "2012-01-01  176        6\n",
       "2012-01-02  407        0\n",
       "2012-01-03  639        1\n",
       "2012-01-04  759        2\n",
       "2012-01-05  858        3"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes_sum.ix[:,'weekday'] = bikes_sum.index.weekday\n",
    "bikes_sum.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来我们试试能不能把每条路线都加起来，然后算出一天骑自行车出门的人数之和。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Sunday</th>\n",
       "      <td>407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Monday</th>\n",
       "      <td>639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tuesday</th>\n",
       "      <td>759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wednesday</th>\n",
       "      <td>858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saturday</th>\n",
       "      <td>176</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             0\n",
       "Sunday     407\n",
       "Monday     639\n",
       "Tuesday    759\n",
       "Wednesday  858\n",
       "Saturday   176"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weekday_counts = bikes_sum.groupby('weekday').aggregate(sum)\n",
    "weekday_counts\n",
    "# weekday_counts.index = ['Monday', 'Tuesday', \n",
    "#                         'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']\n",
    "weekday_counts.index = ['Sunday','Monday','Tuesday','Wednesday','Saturday']\n",
    "weekday_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
